Multi-Scale Audio Spectrogram Transformer for Classroom Teaching Interaction Recognition

نویسندگان

چکیده

Classroom interactivity is one of the important metrics for assessing classrooms, and identifying classroom through image data limited by interference complex teaching scenarios. However, audio within are characterized significant student–teacher interaction. This study proposes a multi-scale spectrogram transformer (MAST) speech scene classification algorithm constructs interactive dataset to achieve teacher–student recognition in process. First, original signal sampled pre-processed generate multi-channel spectrogram, which enhances representation features compared with single-channel features; Second, order efficiently capture long-range global context globally modeled multi-head self-attention mechanism MAST, feature resolution reduced during extraction continuously enrich layer-level while reducing model complexity; Finally, further combination time-frequency enrichment module maps final output class map, enabling accurate category recognition. The experimental comparison MAST carried out on public environment self-built interaction datasets. Compared previous state-of-the-art methods datasets AudioSet ESC-50, its accuracy has been improved 3% 5%, respectively, reached 92.1%. These results demonstrate effectiveness field general smart domain.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multi-scale Enveloping Spectrogram for Bearing Defect Detection

This paper presents a new signal processing technique for bearing defect detection, called Multi-Scale Enveloping Spectrogram (MUSENS). The technique decomposes vibration signals measured on rolling bearings into different scales by means of a continuous wavelet transform (CWT). The envelope signal in each scale is then calculated from the modulus of the wavelet coefficients. Subsequently, Four...

متن کامل

Universität Augsburg Audio Brush : Editing Audio in the Spectrogram

A tool for editing audio signals in the spectrogram is presented. It allows manipulating the spectrogram of a signal at any chosen time-frequency resolution directly and to reconstruct the edited signal in HiFi quality – a capability that is usually not possible with the Fourier or wavelet transformation. Image processing and computer vision methods are applied to the spectrogram in order to id...

متن کامل

Universität Augsburg Audio Brush : Smart Audio Editing in the Spectrogram

Starting with a novel audio analysis and editing paradigm, a set of new and adaptive audio analysis and editing algorithms in the spectrogram are developed and integrated into a smart visual audio editing tool in a “what you see is what you hear” style. At the core of our algorithms and methods is a very flexible audio spectrogram that goes beyond FFT and Wavelets and supports manipulating a si...

متن کامل

Audio-visual Interaction in Model Adaptation for Multi-modal Speech Recognition

This paper investigates audio-visual interaction, i.e. inter-modal influences, in linear-regressive model adaptation for multi-modal speech recognition. In the multi-modal adaptation, inter-modal information may contribute the performance of speech recognition. Thus the influence and advantage of intermodal elements should be examined. Experiments were conducted to evaluate several transformati...

متن کامل

Audio Spectrogram Representations for Processing with Convolutional Neural Networks

One of the decisions that arise when designing a neural network for any application is how the data should be represented in order to be presented to, and possibly generated by, a neural network. For audio, the choice is less obvious than it seems to be for visual images, and a variety of representations have been used for different applications including the raw digitized sample stream, hand-c...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Future Internet

سال: 2023

ISSN: ['1999-5903']

DOI: https://doi.org/10.3390/fi15020065